Given a string s and an array of strings words, return the number of words[i] that is a subsequence of s.
A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.
- For example,
"ace"is a subsequence of"abcde".
Example 1:
Input: s = "abcde", words = ["a","bb","acd","ace"] Output: 3 Explanation: There are three strings in words that are a subsequence of s: "a", "acd", "ace".
Example 2:
Input: s = "dsahjpjauf", words = ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"] Output: 2
Constraints:
1 <= s.length <= 5 * 1041 <= words.length <= 50001 <= words[i].length <= 50sandwords[i]consist of only lowercase English letters.
Average Rating: 4.47 (45 votes)
Approach #1: Brute Force [Time Limit Exceeded]
Intuition and Algorithm
Let's try to check separately whether each word words[i] is a subsequence of S.
For every such word, we try to find the first occurrence of word[0] in S, then from that point on try to find the first occurrence of word[1], and so on.
Complexity Analysis
-
Time Complexity: O(words.length∗S.length+∑iwords[i].length). For each word, our
subseqcheck on wordwords[i]may take time complexity O(S.length+words[i].length). -
Space Complexity: O(1). (In Java, our space complexity is O(S.length+max(words[i].length)), but can be made to be O(1) with a pointer based implementation.)
Approach #2: Next Letter Pointers [Accepted]
Intuition
Since the length of S is large, let's think about ways to iterate through S only once, instead of many times as in the brute force solution.
We can put words into buckets by starting character. If for example we have words = ['dog', 'cat', 'cop'], then we can group them 'c' : ('cat', 'cop'), 'd' : ('dog',). This groups words by what letter they are currently waiting for. Then, while iterating through letters of S, we will move our words through different buckets.
For example, if we have a string like S = 'dcaog':
heads = 'c' : ('cat', 'cop'), 'd' : ('dog',)at beginning;heads = 'c' : ('cat', 'cop'), 'o' : ('og',)afterS[0] = 'd';heads = 'a' : ('at',), 'o' : ('og', 'op')afterS[0] = 'c';heads = 'o' : ('og', 'op'), 't': ('t',)afterS[0] = 'a';heads = 'g' : ('g',), 'p': ('p',), 't': ('t',)afterS[0] = 'o';heads = 'p': ('p',), 't': ('t',)afterS[0] = 'g';
Algorithm
Instead of a dictionary, we'll use an array heads of length 26, one entry for each letter of the alphabet. For each letter in S, we'll take all the words waiting for that letter, and have them wait for the next letter in that word. If we use the last letter of some word, it adds 1 to the answer.
For another description of this algorithm and a more concise implementation, please see @StefanPochmann's excellent forum post here.
Complexity Analysis
-
Time Complexity: O(S.length+∑iwords[i].length).
-
Space Complexity: O(words.length). (In Java, our additional space complexity is O(∑iwords[i].length), but can be made to be O(words.length) with a pointer based implementation.)
September 5, 2019 9:11 AM
I really hope I could come up with such an answer in real interviews. gj!
June 9, 2019 10:56 AM
[Next Letter Pointers] is really a nice solution.
Based on the shown implementation, the first letter is already removed from dict since not needed. Therefore, I feel the explanation should be adjusted as follows:
// at beginning, heads = 'c' : ('at', 'op'), 'd' : ('og')
// after S[0] = 'd', heads = 'c' : ('at', 'op'), 'o' : ('g')
// after S[1] = 'c', heads = 'a' : ('t'), 'o' : ('g', 'p')
// after S[2] = 'a', heads = 'o' : ('g', 'p'), 't': ('')
// after S[3] = 'o', heads = 'g' : (''), 'p': (''), 't': ('')
// after S[4] = 'g', heads = 'p': (''), 't': (''), found '' in g, so res++
March 5, 2018 8:04 AM
@awice
Can you explain how complexity is O(S.length+∑iwords[i].length), not O( S.size() * #of words) ?
The space complexity seems O(# of words). How could pointer based implementation give O(1) space?
Time Complexity: O(words.length∗S.length+∑iwords[i].length)O(\text{words.length} * S\text{.length} + \sum_i \text{words[i].length})O(words.length∗S.length+∑iwords[i].length). For each word, our subseq check on word words[i] may take time complexity O(S.length+words[i].length)O(S\text{.length} + \text{words[i].length})O(S.length+words[i].length).
Isn't for each word, subseq check on word words[i] has time complexity O(S.length). Why O(S.length+words[i].length) ?
February 15, 2021 11:57 AM
Here is a simplified version of this solutions without messing with iterators and ord/chr:
class Solution:
def numMatchingSubseq(self, S: str, words: List[str]) -> int:
n_subsequences = 0
state = defaultdict(list)
for word in words:
state[word[0]].append(word[1:])
for char in S:
old_bucket = state[char]
state[char] = []
for suffix in old_bucket:
if not suffix:
n_subsequences += 1
else:
state[suffix[0]].append(suffix[1:])
return n_subsequences
a day ago
Shouldn't the time complexity for Approach#2 be O(s.length() * words.length)? Considering the case where all words are the same: outer for loop takes s.length(), inner for loop takes words.length.
Why not to use Trie instead of buckets ?
could someone please help me understand the time complexity for the second solution?
Why are we adding the terms in the big O and not multiplying them? I got
O(|S| * #words) - can someone help?
there's no reason to do old_bucket.clear(), removing that will improve performance
July 29, 2019 12:31 PM
This solution is really good! Many thanks
Time Submitted | Status | Runtime | Memory | Language |
|---|---|---|---|---|
| 06/23/2021 06:32 | Accepted | 172 ms | 48.1 MB | cpp |
| 06/15/2021 12:00 | Accepted | 180 ms | 48 MB | cpp |
| 05/25/2021 20:07 | Accepted | 168 ms | 48 MB | cpp |
| 05/25/2021 20:02 | Time Limit Exceeded | N/A | N/A | cpp |
| 06/23/2020 16:18 | Accepted | 412 ms | 44.1 MB | cpp |
| 06/23/2020 16:16 | Wrong Answer | N/A | N/A | cpp |
xxxxxxxxxxclass Solution {public: int numMatchingSubseq(string S, vector<string>& words) { vector<vector<int>> cmap(26); for (int i = 0; i < S.length(); i++) { cmap[S[i] - 'a'].push_back(i); } int res = 0; for (auto& word : words) { int i = 0, n = word.length(), j = -1; while (i < n) { int c = word[i] - 'a'; if (cmap[c].empty()) break; auto it = upper_bound(begin(cmap[c]), end(cmap[c]), j); if (it == cmap[c].end()) break; j = *it; i++; } res += (i == n); } return res; }};